Sorting by Recursive Partitioning

نویسنده

  • Daniel M. Chapiro
چکیده

We present a new O(nlglgn) time sort algorithm that is more robust than O(n) distribution sorting algorithms. The algorithm uses a recursive partition-concatenate approach, partitioning each set into a variable number of subsets using information gathered dynamically during execution. Sequences are partitioned using statistical information computed during the sort for each sequence. _ Space complexity is O(n) and is independent from the order and distribution of the data. If the data is originally in a list, only O(K) n ex ra s t pace is necessary. The algorithm is insensitive to the initial ordering of the data, and it is much less sensitive to the distribution of the values of the sorting keys than distribution sorting algorithms. Its worst-case time is O(n lg lg n) across all distributions that satisfy a new “fractalness” criterion. This condition, which is sufficient but not necessary, is satisfied by any set with bounded length keys and bounded repetition of each key. If this condition is not satisfied, its worst case performance degrades gracefully to O(n lgn) . In practice, this occurs when the density of the distribution over 0(n) of the keys is a fractal curve (for sets of numbers whose values are bounded), or when the distribution has very heavy a tails with arbitrarily long keys (for sets of numbers whose precision is bounded). In some preliminary tests, it was faster than Quicksort for sets of more than 150 elements. The algorithm is practical, works basically “in place”, can be easily implemented and is particularly well suited both for parallel processing and for external sorting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques

Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...

متن کامل

Generic top-down discrimination for sorting and partitioning in linear time

We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on representations of ordering and equivalence relations. Discriminators improve the asymptotic performance of generic comparison-based sorting and partitioning, and can be implemented not to ex...

متن کامل

A numerical analysis of Quicksort: How many cases are bad cases?

We present numerical results for the probability of bad cases for Quicksort, i. e. cases of input data for which the sorting cost considerably exceeds that of the average. Dynamic programming was used to compute solutions of the recurrence for the frequency distributions of comparisons. From these solutions, probabilities of numbers of comparisons above certain thresholds relative to the averag...

متن کامل

Adaptive Data Partitioning Using Probability Distribution

Many computing problems benefit from dynamic data partitioning—dividing a large amount of data into smaller chunks with better locality. When data can be sorted, two methods are commonly used in partitioning. The first selects pivots, which enable balanced partitioning but cause a large overhead of up to half of the sorting time. The second method uses simple functions, which is fast but requir...

متن کامل

Optimal Clustering of Relations to Improve Sorting and Partitioning for Joins

The sorting or partitioning of relations is very common in relational database systems. Implementations of the join operation include the sort–merge join algorithm, which sorts both relations, and the hash join algorithm, which usually partitions both relations. We describe how clustering records using an optimal multi-attribute hash (MAH) "le, taking the query pattern and distribution into acc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998